Reference Line Extraction from Form Documents with Complicated Backgrounds
نویسندگان
چکیده
Form document analysis is one of the most essential tasks in document analysis and recognition. One of the most fundamental and crucial tasks is the extraction of the reference lines which are contained in almost all form documents. This paper presents an efficient methodology for the complicated grey-level form image processing. We construct a non-orthogonal wavelet with adjustable rectangle supports and offer algorithms for the extraction of the reference lines based on the strip growth method using the multiresolution wavelet sub images. We have compared this system with the popular Hough transform (HT) based and the novel orthogonal wavelet based methods. As shown in the experiments, the proposed algorithm demonstrates high performance and fast speed for the complicated form images. This system is also effective for the form images with slight skew.
منابع مشابه
Microsoft Word - john_icita_ell.rtf
-In this paper, we present a fast and robust ellipse extraction method. The proposed method can extract ellipses with high accuracy and speed from images with complicated backgrounds. It consists of two parts. First, we extract arc segments from an ellipse approximated by short straight lines that are extracted by a fast line extraction algorithm. Second, the arc segments are used to calculate ...
متن کاملBasic Test Framework for the Evaluation of Text Line Segmentation and Text Parameter Extraction
Text line segmentation is an essential stage in off-line optical character recognition (OCR) systems. It is a key because inaccurately segmented text lines will lead to OCR failure. Text line segmentation of handwritten documents is a complex and diverse problem, complicated by the nature of handwriting. Hence, text line segmentation is a leading challenge in handwritten document image processi...
متن کاملStroke-model-based character extraction from gray-level document images
Global gray-level thresholding techniques such as Otsu's method, and local gray-level thresholding techniques such as edge-based segmentation or the adaptive thresholding method are powerful in extracting character objects from simple or slowly varying backgrounds. However, they are found to be insufficient when the backgrounds include sharply varying contours or fonts in different sizes. A str...
متن کاملAutomatic Detection of Font Size Straight from Run Length Compressed Text Documents
Automatic detection of font size finds many applications in the area of intelligent OCRing and document image analysis, which has been traditionally practised over uncompressed documents, although in real life the documents exist in compressed form for efficient storage and transmission. It would be novel and intelligent if the task of font size detection could be carried out directly from the ...
متن کاملAn Efficient Recognition and Data Extraction Method for Table-Form Documents
In Asia, many documents processed in offices are table-form documents. Hence the automatic processing of table-form documents is an important issue of the office automation research. In this paper, we propose an efficient representation method for table-form documents. The representation method is based on three types of line segments. The line segments are normalized and sorted, hence the repr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003